Machine translation for Arabic dialects (survey)
نویسندگان
چکیده
منابع مشابه
Machine Translation of Arabic Dialects
Arabic Dialects present many challenges for machine translation, not least of which is the lack of data resources. We use crowdsourcing to cheaply and quickly build LevantineEnglish and Egyptian-English parallel corpora, consisting of 1.1M words and 380k words, respectively. The dialectal sentences are selected from a large corpus of Arabic web text, and translated using Amazon’s Mechanical Tur...
متن کاملMachine-Translation History and Evolution: Survey for Arabic-English Translations
As a result of the rapid changes in information and communication technology (ICT), the world has become a small village where people from all over the world connect with each other in dialogue and communication via the Internet. Also, communications have become a daily routine activity due to the new globalization where companies and even universities become global residing cross countries’ bo...
متن کاملArabic Preprocessing Schemes for Statistical Machine Translation
In this paper, we study the effect of different word-level preprocessing decisions for Arabic on SMT quality. Our results show that given large amounts of training data, splitting off only proclitics performs best. However, for small amounts of training data, it is best to apply English-like tokenization using part-of-speech tags, and sophisticated morphological analysis and disambiguation. Mor...
متن کاملSynthetic Data for Neural Machine Translation of Spoken-Dialects
In this paper, we introduce a novel approach to generate synthetic data for training Neural Machine Translation systems. The proposed approach transforms a given parallel corpus between a written language and a target language to a parallel corpus between a spoken dialect variant and the target language. Our approach is language independent and can be used to generate data for any variant of th...
متن کاملParsing Arabic Dialects
The Arabic language is a collection of spoken dialects with important phonological, morphological, lexical, and syntactic differences, along with a standard written language, Modern Standard Arabic (MSA). Since the spoken dialects are not officially written, it is very costly to obtain adequate corpora to use for training dialect NLP tools such as parsers. In this paper, we address the problem ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Processing & Management
سال: 2019
ISSN: 0306-4573
DOI: 10.1016/j.ipm.2017.08.003